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Description 

The present invention relates to a novel gene encoding one of proteins, called Smad, which intracellularly trans- 
duce stimuli elicited by a physiologically active substance belonging to the transforming growth factor beta (hereinafter 

5 referred to as TGFp) family 

The TGFp family is a group of peptidic physiologically active substances widely distributed in the animal kingdom. 
This family include very important physiologically active peptides such as TGFp found as a substance involved in pro- 
liferation of tumor cells, bone morphogenetic protein (BMP) that has a significant function for bone formation in verte- 
brates including human, inhibin that regulates the secretion of follicle-stimulating hormone from pituitary gland, activin 

10 that possesses activity as an erythroide differentiation factor, and neurotrophic factor derived from glial cells (Bock, G. 
R. and Marsh, J. eds., 1991, Clinical Application TGFp, Ciba Foundation Symposium, Johns Wiler & Sons). A clue to 
solution to the question how the TGFp family peptides act on cells was initially obtained by cDNA cloning of receptors 
for these peptides and by determining their nucleotide sequences together with the amino acid sequences deduced 
therefrom. Receptors for this family have a transmembrane-type protein-phosphotransferase activity (protein kinase 

15 activity) specific to serine (and threonine) residues. This fact demonstrates that phosphorylation of intracellular protein 
is involved in transduction of stimuli from the TGFp family (Sporn, M. B. and Roberts, A. B. eds., 1990, "Peptide Growth 
Factors and Their Receptors", part I and II, Springer-Verlag, Berlin). 

As factors mediating stimuli from the TGFp family genes designated as Mad and Sma are hitherto known in Dro- 
sophila and Nematoda, respectively In recent years, several genes showing homology to Mad and Sma have been 

20 found in several vertebrates including human. Their cDNAs have been cloned and their nucleotide sequences were 
determined. Those genes and the proteins encoded by the same were named Smad, and to date, Smad1, Smad2, 
SmadS, Smad4, SmadS, and Smad6 have been reported (Derynck R. ef a/., 1996, "Nomenclature: Vertebrate media- 
tors of TGFp family signals", Cell, 18, 1 73). Furthermore, during early embryogenesis, Smadi is known to be essential 
for a basic and significant determination as for which side of the embryo becomes dorsal and the other becomes ventral 

25 (Graff, J. M. et al., 1 996, Cell, 85, 479-487). It has been shown that inactivation of Smad2 gene is one of the causes for 
colorectal cancer in human, while Smad4 gene is shown to be identical to a tumor suppressor gene DPC4 that is known 
to be strongly associated with repression of pancreatic cancer (Eppert, K., ef a/., 1996, Cell, 86, 543-552; Hahn, S. A. 
etal., 1996, Science, 271 , 350-353). Thus, the Smad protein family may be signal transduction factors which transduce 
stimuli from physiologically active peptides of the TGFp family, while they may also be factors profoundly involved in 

30 generation of cancer. 

The known Smad family proteins are intracellular proteins consisting of about 400-550 amino acid residues which 
have an amino- and a carboxy-terminal regions relatively well conserved among the family As a consequence of 
increased kinase activity of the specific receptors induced by the TGF p family stimulus, Smad proteins are rapidly 
phosphorylated and concentrated into nucleus. In the nucleus, the field of gene transcription, Smad proteins uniquely 

35 regulate gene expression through unknown mechanism mediated by oligomer formation among the same or different 
kinds of molecules (Massagu J., 1996, Cell, 85, 947-950). 

In recent years, it has been turned out that a variety of physiologically active substances like TGFp, including hor- 
mones and cytokines, function eventually through regulation of gene expression in the target cells. Specificity of activity 
of each physiologically active substance is determined by the nature of receptor and subsequent signal transduction 

40 factor for the particular substance. In addition, a signal caused by a single physiologically active substance often acti- 
vate several kinds of signal transduction factors, which results in branching of the transduction pathway Isolation of sig- 
nal transduction factors and elucidation of their properties are therefore helpful to understand mechanisms through 
which various physiologically active substances function, and to employ the factors as targets for pharmaceuticals. 
As described above, the TGFp family members play very important roles in various physiological events including 

45 growth control, immune response, cell differentiation, morphogenesis during embryo and the like. More than 50 physi- 
ologically active substances belonging to the TGFp family are hitherto known, and they include substances of which 
deficiency or excess in quantity, or abnormality in quality is known to be associated with pathologies related to the 
above physiological events, such as cancer, autoimmune disease, osteoporosis, anemia, congenital deformity and the 
like. Similarly, genetic analyses have also shown that defects in the Smad family, which transduces stimuli (signals) from 

50 the TGFp family, is involved in various abnormalities or pathologies, for example, in cancer which is the highest cause 
of death in advanced countries including many of Europe and North America. For prevention or treatment of cancer, it 
is desirable to elucidate all of the genes associated with cancer. However, relatively small number of Smads have been 
hitherto identified when compared with the already known physiologically active substances belonging to the TGFp 
family This fact suggests that there remain unidentified Smad family members. Therefore, isolation of a novel Smad 

55 gene will enable us to find a further pathway involving the TGFp family, and such a gene is expected to be useful as a 
diagnostic agent for detecting abnormalities, such as tumor, at gene level. 

The present invention aims to provide a novel factor belonging to the Smad family which transduces a signal of a 
physiologically active substance of the TGFp family and to provide a gene encoding said factor. 
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In view of the important role, in biological responses, of TGFp peptides and their signal transducer Smad family pro- 
teins, the present inventors screened cDNAs derived from Mus musculus in order to clone a novel Smad gene. As a 
result, cDNA clones which correspond to mRNA encoding a novel Smad family protein were identified in a cDNA mix- 
ture derived from the whole tissue of the 17-day embryos. The present invention has been completed on the basis of 
5 this finding. 

Specifically, the first object of the present invention is to provide a gene encoding a novel signal transduction factor 
which belongs to the Smad family 

The second object of the present invention is to provide a protein encoded by the above gene, that is, a signal trans- 
duction factor. 

10 

BRIEF DESCRIPTION OF THE DRAWING 

Fig. 1 is a gene map of an expression vector pactEF-Smad7 for expression of SmadT in animal cells. 

Fig. 2 shows the result of denaturing polyacrylamide gel electrophoresis of a fused protein between Smad? and 

15 Myc-tag peptide. 

Fig. 3 is a gene map of a plasmid vector pIBIA-mSmad? in which Smad7 has been cloned. 
Fig. 4 shows the result of agarose gel electrophoresis of mRNA (sense-strand RNA) and antisense RNA for 
Smad7. 

Fig. 5 shows the result of denaturing polyacrylamide gel electrophoresis of Smad? protein synthesized in vitro. 
20 Fig. 6 shows a sequence comparison between Smad? and Smad1 . 

(1) A gene encoding a signal transduction factor Smad? 

The novel Smad in the present invention (Smad?) has the following characteristics. 

25 

1) coding region 

As shown in SEQ ID NO: 1 , it consists of 1281 nucleotide pairs, and encodes a sequence consisting of 426 amino 
acid residues shown in SEQ ID NO: 4. 

30 

2) 5' terminal non-coding region 

It comprises 209 nucleotide pairs shown in SEQ ID NO: 2, and the coding region described in 1) is contiguously 
linked to its 3' end. 

35 

3) 3' terminal non-coding region 

It comprises 20? nucleotide pairs and is linked immediately to the coding region described in 1). 

The cDNA for the novel Smad of the present invention (Smad?) was obtained by the procedures as described 

40 below. Firstly a highly homologous region was determined among the amino acid sequences of vertebrate Smad family 
members already reported. The amino acid sequence of such highly homologous region is expected to be essential for 
important function of the Smad family, and therefore, it must be conserved also in unknown Smads. Accordingly, oligo- 
nucleotide primers for DNA amplification by PGR (polymerase chain reaction) method were designed and synthesized 
(Saiki, R. etal., 1985, Science, 230, 1350-1354) on the basis of the information about the highly homologous region. 

45 A pool of cDNA mixture which was prepared using a publicly known method (Kenji Okazaki, 1 995, "mRNA-No-Chousei- 
Hou", Shunsuke Uda et al. eds., In "Meneki-Jikken-Sousa-Hou", vol I, pp. 349-352, Nankodou) from polyadenylated 
RNAs derived from Mus musculus embryos and which was ligated to an adapter DNA was used as a template source 
in the PGR. The PGR was performed using the primer oligonucleotide described above in combination with an oligomer 
specific to the adapter DNA. The partial amino acid sequences deduced from the nucleotide sequence of a PGR prod- 

50 uct was recognized to have a homology with the amino acid sequence of the Smad family proteins. Based on the nucle- 
otide sequence thus obtained, oligomers corresponding to the 5' and 3' termini were synthesized, and used in the PGR 
in which the above cDNA mixture was used as a template source to obtain a cDNA containing the entire coding region. 
After cloning of this cDNA in a general-purpose plasmid vector, the nucleotide sequence was determined. Since the 
nucleotide sequence of this cDNA is now shown in SEQ ID NOs: 1 , 2 and 3, one can also obtain this cDNA by synthe- 

55 sizing sense and antisense oligomers respectively corresponding to the 5'- and 3'-termini of the DNA, and then perform- 
ing a PGR in which a Mus musculus embryo cDNA mixture is used as a template source. 

When the amino acid sequence deduced from the cDNA sequence so obtained was compared with the known 
amino acid sequence of Smadi (NGBI (U.S. National Genter for Biotechnology Information) Identification numbers: 
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1 33271 4, 1 333647, 1 381 671 , 1 51 8645, and 1 654323), it was revealed that the sequences exhibit relatively high homol- 
ogy of 65% in the C-terminal region, demonstrating that the obtained cDNA encodes a Smad family protein. In addition, 
the amino acid sequence is apparently different from the sequence of any one of the previously disclosed vertebrate- 
derived Smad family members, Smadi - SmadB (NCBI Identification: Smad1; 1518645, 1658159, 1333647, 1654323, 

5 1469308, 1438077, 1332714: Smad2; 1407782, 1575530: Smad3; 1673577, 1552532: Smad4; 1724091. 1163234: 
Smad5; 1518647, 1654325: Smad6; 1654327) and from the invertebrate-derived Drosophila Mad (NCBI Identification: 
1085150, 551489) and Nematoda Sma (NCBI Identification: 1173452, 1173453, 1173454). Thus, the present cDNA 
was identified as cDNA for a novel Smad family protein and designated as Smad7. The nucleotide sequence of the 
present cDNA is also distinctly different from any one of previously disclosed Smad family cDNAs (GenBank accession 

w numbers: Smadi ; U54826, U57456, U58992, U59912, U59423, U58834, L77888: Smad2; U5991 1 , U60530, U65019, 
U68018. L77885: Smad3; U68019, U76622: Smad4; U79748, U44378: Smad5; U58993: Smad6; U59914). 

(2) Smad7 protein 

15 The Smad7 protein of the present invention has the following characteristics. 

1) Amino acid sequence 

The amino acid sequence deduced from the above Smad7 cDNA nucleotide sequence is shown in SEQ ID NO: 4. 

20 

2) Molecular weight 

The molecular weight of Smad7 protein calculated from the amino acid sequence shown in SEQ ID NO: 4 is 46516. 
25 3) Isoelectric point 

The isoelectric point of Smad7 protein calculated from the amino acid sequence shown in SEQ ID NO: 4 is 8.3. 
Smad7 protein was obtained by the procedures as described below. 

Smad7 cDNA shown in SEQ ID NO: 1 was ligated downstream to a promoter region in a known expression plasmid 
30 vector, and the vector was transformed into E. coli cells, from which the plasmid DNA was then purified. The plasmid 
DNA was then introduced into cultured cells to produce Smad7 protein. As a vector for the expression of Smad7, 
pactEF-mSmad7 (Fig. 1) constructed by the present inventors was used. E. coli strain transformed with this vector, 
Escherichia coli (pactEF-mSmad7), has been deposited with National Institute of Bioscience and Human Technology 
(deposition date: April 8, 1997; accession number: FERM P-16187). Smad7 protein can be synthesized in E. coli cells 
35 transformed with Smad7 cDNA. There are many known vectors for expression in E. coli cells, and one can construct a 
desirable expression vector by inserting Smad7 cDNA into one of those known vectors. Such known vectors are, for 
example, pET series vectors, pKC30, and the like (Sambrook, J. etal., 1989, "Molecular Cloning", Cold Spring Harbor 
Laboratory Press, USA). 

In addition, a fusion protein in which the entire or a part of Smad7 protein is fused to another amino acid sequence 

40 can also be expressed. To this end, a gene or oligonucleotide encoding an amino acid sequence to be added is ligated 
to the entire or a part of Smad7 cDNA so that the codon frames for both sequences agree with each other. 

Furthermore, the Smad7 cDNA shown in SEQ ID NO: 1 may be transcribed into RNA, and the RNA so obtained 
may be added to an intracellular translation system or cell-free translation system to synthesize Smad7 protein. There 
are many known vectors for transcription of cloned DNA into RNA. For example, SP64, plBI31 , pGEM3 and the like may 

45 be used for this purpose (Sambrook, J. etal., 1989, "Molecular Cloning", Cold Spring Harbor Laboratory Press, USA). 
As RNA polymerases, those derived from bacteriophages SP6, T3 and T7 may be used. As a system for synthesizing 
protein from synthesized sense-strand RNA, a system using oocytes of Xenopus laevis is known (Mayumi Nishizawa, 
Noriyuki Sakata, 1992, "in vitro- No-Tanpakushitu-No-Seigousei", In "Shin-Seikagaku-Jikken-Kouza, 1, Tanpakushitu 
VI", edited by The Japanese Biochemical Society Tokyo Kagaku Dojin). Similarly a method using rabbit reticulocyte 

50 lysate may also be used as a cell-free translation system (Kozak, M., 1990, Nuc. Acids Res., 18, 2828). In another 
embodiment, the Smad7 cDNA shown in SEQ ID NOs: 1 , 2 and 3 or a part thereof may be used as a template to syn- 
thesize an antisense RNA for Smad7. Such antisense RNA may be used for diagnosis of Smad7-related pathologies. 
Furthermore, by ligating DNA having an appropriate sequence to a transcription vector containing a sequence for 
Smad7, antisense RNA may be synthesized as RNA molecule having a ribozyme activity 

55 According to known techniques, one skilled in the art can obtain mutant proteins in which deletion, substitution or 
insertion of one or more amino acid residues are introduced into the amino acid sequence shown in SEQ ID NO: 4, by 
introducing mutation(s) into the DNA shown in SEQ ID NO: 1 in the Sequence Listing, for example, by a site-directed 
mutagenesis. Among such mutant proteins, those retaining a signal transduction activity are included within the scope 
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of the present invention. 
EXAMPLES 

5 The present invention is further illustrated by the following Examples. 

Example 1 : Cloning of Mus musculus Smad7 cDNA 
1) Design of primer oligonucleotide and PGR 

10 

A gene-specific antisense oligomer GSP1 (Gene Specific Primer 1) having a sequence: 
5'-GTT(A/C/G/nA(A/G)GTG(A/C/G/T)AC(C/T^TC(A/C/G/nA(G/T)CCAGCA 
was synthesized on the basis of the following amino acid sequence: 

Pro Cys Trp Leu/lle Glu Val/lle His Leu Asn which is well conserved among Smad proteins. Similarly, another gene- 
15 specific antisense oligomer GSP2 having a sequence: 

5'-GTA(A/C/G/T)(C/ll(A/C)(A/C/GyT)G(C/G)(A/C/G/10CCCC 
was synthesized on the basis of the following amino acid sequence: 
Phe Val Lys Gly Trp Gly Ala/Pro/Cys/Glu Thr. 

The first PGR was performed using adapter oligonucleotide-attached cDNA mixture derived from Mus musculus 
20 17-day embryo (manufactured by CLONTEGH, USA) as a template pool, with the following adapter-specific oligomer 1 : 
(AP 1 ) : 5'-GGATGGTAATAGGAGTGAGTATAGGGG-3' 
as a sense oligomer and the above GSP1 as an antisense oligomer. The reaction conditions were as follows: after 180 
seconds at 96°G, 5 cycles of denaturation at 96°G for 30 seconds and annealing/elongation at 72°G for 240 seconds, 
5 cycles of denaturation at 96°G for 30 seconds and annealing/elongation at 70°G for 240 seconds, and then 25 cycles 
25 of denaturation at 96°G for 30 seconds, annealing at 60°G for 30 seconds and elongation at 68°G for 120 seconds. 

Then, DNA molecules contained in 1/500 volume of the reaction mixture so obtained was used as templates in the 
second PGR in which the following adapter-specific oligomer 2: 

(AP2) : 5'-AGTGAGTATAGGGGTGGAGGGGG-3' 
and the above GSP2 were used as sense and antisense oligomers, respectively The reaction conditions were as fol- 
30 lows: after 2 minutes at 96°G, 20 cycles of denaturation at 96°G for 30 seconds and annealing/elongation at 68°G for 
120 seconds. In these PGR, Expand High Fidelity PGR System (BOEHRINGER MANNHEIM, Germany) (Barnes, W. 
M., 1994, Proa Natl. Acad. Sci. USA, 91 , 2216-2220) was used as thermostable DNA polymerase, and PGR Thermal 
Gycler MP (TAKARA SHUZO) was used as a heat cycle equipment. 

Analysis of the nucleotide sequence of about 1 ,200 bp DNA fragment obtained in the above PGR revealed that it 
35 contained both sequences encoding the following two amino acid sequences conserved among the Smad lamily, 
Lys Lys Leu Lys Glu, and 
Arg Trp Pro Asp Leu. 

The DNA fragment was, therefore, considered to be derived from cDNA encoding a Smad family protein. 

40 2) Analysis of the 5' and 3' terminal regions 

On the basis of the nucleotide sequence obtained in the above item 1), the following Smad-specific antisense oli- 
gomer: 

5'-GGGGAGGAGGGGAGATGGTTTGGTGG-3' 

45 was synthesized in order to perform a 5'- RAGE (rapid amplification of 5'-cDNA ends) (Frohman, M.A., 1993, Methods. 
EnzymoL, 218, 340-358). Similarly the following Smad-specrfic sense oligomer: 

5'-TTGATGGAGGAAGGATGGAGGGGTTTG-3' 
was also synthesized in order to perform a 3'-RAGE (rapid amplification of 3'-cDNA ends) (Frohman, M. A., 1993, Meth- 
ods. Enzymol., 218, 340-358). In these reactions, the same cDNA mixture derived from Mus musculus 17-day embryos 

50 as that used in the above item 1) was used as template source, together with AP2 and the Smad-specific antisense oli- 
gomer for 5'-RAGE, or together with the Smad-specific sense oligomer and AP2 for 3'-RACE. The reaction conditions 
were as follows: after 180 seconds at 96°G, 5 cycles of denaturation at 96°G for 30 seconds and annealing/elongation 
at 72°G for 240 seconds, 5 cycles of denaturation at 96°G for 30 seconds and annealing/elongation at 70°G for 240 sec- 
onds, and then 25 cycles of denaturation at 96°G for 30 seconds, annealing at 60°G for 30 seconds and elongation at 

55 68°G for 240 seconds. 

Nucleotide sequencing of about 250 bp DNA fragment obtained in this 5'-RAGE revealed that it contains an initia- 
tion codon located at the position conserved among the Smad family and the 5' non-coding region shown in SEQ ID 
NO: 2. Similarly nucleotide sequencing of about 400 bp DNA fragment obtained in the above 3'-RAGE revealed that it 
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contains an termination codon and the 3' non-coding region shown in SEQ ID NO: 3. 
3) Cloning of Smad7 coding region 

5 On the basis of the nucleotide sequence determined in the above item 2), the following specific sense oligomer (to 

which an Nhe\ site for cloning has been added at the 5'-end): 

5'-CCGCTAGCACCATGTTCAGGACCAAACGATCTGCGCTCGTC-3' 
and the following antisense oligomer (to which a BamHI site for cloning has been added at the 5'-end): 
5'-CCGGATCCTATCGCGAGTTGAAGATGACCTCCAGCCAGCACG-3' 

10 were prepared. With these two oligomers, PGR was performed using the same cDNA mixture as that described in the 
above item 1 ) as template source. The reaction conditions were as follows: after 1 80 seconds at 96°C, 5 cycles of dena- 
turation at 96°C for 30 seconds and annealing/elongation at 72°C for 60 seconds, 5 cycles of denaturation at 96°C for 
30 seconds and annealing/elongation at 70°C for 60 seconds, and then 25 cycles of denaturation at 96°C for 30 sec- 
onds, annealing/elongation at 68°C for 60 seconds. 

15 About 0.5 |ig of about 1 ,300 bp DNA fragment obtained in the above reaction was treated with restriction enzymes 
Nhe\ (New England Biolabs, USA) and SamHI (New England Biolabs, USA) at 37°C for one hour, and then purified by 
gel electrophoresis in 0.7% low-melting point agarose. The DNA fragment was then cloned using a known plasmid vec- 
tor pIBIA (International Biotechnologies, Inc., USA; it may be prepared from plBI31 according to the method of Furuno, 
N. et al., 1994, EMBO J., 13, 2399-2410) and host cells, JM109, derived from E. coli K-12 strain (Sambrook, J. et al., 

20 1989, "Molecular Cloning", Cold Spring Harbor Laboratory Press, USA) (TOYOBO, Japan) to obtain a plasmid pIBIA- 
mSmad7 (Fig. 3). E. coli strain transformed with this plasmid, Escherichia coli (plBIA-mSmad7) has been deposited at 
National Institute of Bioscience and Human Technology (deposition date: April 8, 1997; accession number: PERM P- 
16188). The deposition was converted to the international deposition under Budapest Treaty on March 30, 1998, and 
assigned new accession number PERM BP-6317. This plasmid DNA was then isolated, and the nucleotide sequence 

25 of the cDNA was determined. The nucleotide sequence is shown in SEQ ID NO: 1 . Furthermore, Smad7 cDNA having 
the sequence shown in SEQ ID NO: 1 was also obtained from Mus musculus 1 1-day embryos using the identical pro- 
cedures. 

Example 2: Construction of Smad7 expression vector 

30 

The Smad7 cDNA obtained above, which contains the entire coding region, was inserted into an expression vector 
for cultured animal cells, pactEP (BOEHRINGER MANNHEIM, Germany; it may be prepared from pEMBL9 (+) accord- 
ing to the method of Okazaki K. and Sagata, N., 1995, EMBO J., 14, 5048-5059), to construct pactEF-mSmad7 (Fig. 
1). As above, E. coli strain transformed with this vector, Escherichia coli (pactEP-mSmad7), has been deposited with 

35 National Institute of Bioscience and Human Technology (deposition date: April 8, 1997; accession number: PERM P- 
16187). The deposition was converted to the international deposition under Budapest Treaty on March 30, 1998, and 
assigned new accession number PERM BP-6316. In Fig. 1, p-Actin promoter/EF-la enhancer means a transcription 
promoter of beta-actin derived from chicken genome and the elongation factor 1 alpha derived from human genome; f 1 
ori means the DNA replication initiating region of f 1 phage; bla means a beta-lactamase gene (conferring sulbenicillin 

40 resistance and ampicillin resistance); ori means the DNA replication initiating region derived from pUC plasmid; poly A 
means transcription termination and polyadenylation signal derived from SV40; Smad7 means Smad7 cDNA; H/ndlll, 
Banlll, Seal, and BamHI are the sites at which the DNA is cleaved by respective restriction enzymes. 

In addition, a vector pactEP-Myc-mSmad7 was also prepared for expression of the fusion protein in which a frag- 
ment containing the following sequence of a known epitope peptide, Myc-tag (Evan, G. I. et al., 1985, Mol. Cell. Biol. 

45 5, 3610-3616): 

Glu Gin Lys Leu lie Ser Glu Glu Asp Leu 

has been added to the N-terminus of Smad7. E. coli strain transformed with this vector, Escherichia coli (pactEF-Myc- 
mSmad7), has been deposited with National Institute of Bioscience and Human Technology (deposition date: April 8, 
1997; accession number: PERM P-16186). The deposition was converted to the international deposition under Buda- 

50 pest Treaty on March 30, 1998, and assigned new accession number PERM BP-6315. To prepare this vector, the fol- 
lowing DNA oligomer: 

5'-ATGTCTGAGCAGAAGCTGATCTCTGAGGAAGACCTTGGAGCTAGCACC-3' 
was inserted just before the translation initiation codon of Smad7. This vector DNA was then introduced into mouse 
NIH3T3 cells by the calcium phosphate method (Graham, P L. and van der Eb, A. J., 1973, Virology, 52, 456-457). After 

55 48 hours, the whole cell extract was separated by denaturing polyacrylamide gel electrophoresis (Laemmli, U. K. et al., 
1970, J. Mol. Biol., 49, 99-113), and analyzed by immunoblolting (Harlow, E. and Lane D., 1988, "Antibodies", Cold 
Spring Harbor Laboratory Press, USA) using anti-Myc-tag monoclonal antibody (SANTA CRUZ, USA). The electro- 
phoretic analysis revealed expression of the fused protein between Smad7 and Myc-tag as a band at the position cor- 
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responding to a molecular weight of about 48,000 (Fig. 2). In Fig. 2, Lane M shows molecular weight markers; Lane 1 
shows the fused protein between Smad7 and Myc-tag peptide synthesized in animal cells. 

Example 3: Synthesis of Smad7 RNA 

5 

The vector pIBIA into which the Smad cDNA was cloned in Example 1 contains a promoter sequence for phage T7 
RNA polymerase upstream to the cDNA, and also a promoter sequence for phage T3 RNA polymerase downstream to 
the cDNA (Furuno, N. et ai, 1994, EMBO J., 13, 2399-2410). Therefore, in order to obtain a sense-strand RNA of 
Smad7, 2 \\q of the plasmid plBIA-mSmad7 (Fig. 3) described in Example 1, which was obtained by cloning Smad7 

10 cDNA into pIBIA, was treated with restriction enzyme BamH\ at 37°C for one hour, and the linearized plasmid so 
obtained was then subjected to a transcription step using T7 RNA polymerase (Ambion, USA). The synthesized RNA 
was gel-electrophoretically homogeneous (Fig. 4). Similarly 2 |ig of the above cDNA was treated with restriction 
enzyme Hin6\\\ at 37°C for one hour, and the linearized plasmid so obtained was then subjected to a transcription step 
using T3 RNA polymerase (Ambion, USA). The synthesized antisense RNA was gel-electrophoretically homogeneous 

15 (Fig. 4). In Fig. 3, f 1 ori means the DNA replication initiating region of f 1 phage; bla means a beta-lactamase gene (con- 
ferring sulbenicillin resistance and ampicillin resistance); ori means the DNA replication origin region derived from a 
pUC plasmid; pT7 means a promoter sequence for phage T7 RNA polymerase; Smad7 means Smad7 cDNA; pT3 
means a promoter sequence for phage T3 RNA polymerase; Fspl, Sspl, Seal, Hin6\\\, SamHI, EcoRI and Aat\ are the 
sites at which the DNA is cleaved by respective restriction enzymes. In Fig. 4, Lane 1 shows Smad7 mRNA (sense- 

20 strand RNA), and Lane 2 shows Smad7 antisense RNA. 

Example 4: Synthesis of Smad7 protein 

About 1 ^ig of the Smad7 sense-strand RNA obtained in Example 3 was added to a cell-free lysate derived from 
25 rabbit reticulocytes (Promega, USA) together with ^^S-labeled amino acids (Amersham, UK), and the mixture was sub- 
jected to translation at 30°C for one hour to obtain Smad7 protein. Denaturing polyacrylamide gel electrophoresis of the 
product along with molecular weight markers revealed that the synthesized Smad7 was a homogeneous protein having 
molecular weight of about 47,000 (Fig. 5). In Fig. 5, Lane M shows molecular weight markers; Lane 1 shows the Smad7 
protein; Lane 2 shows 2-fold amounts of the Smad7 protein as compared with Lane 1 ; Lane 3 shows 4-fold amounts of 
30 the Smad7 protein as compared with Lane 1 ; and the arrowhead indicates the position of the Smad7 protein. 

As shown in Fig. 6, the amino acid sequence of the novel factor of the present invention, Smad7, contains a region 
highly homologous to the previously known signal transduction factor Smadl. The highly homologous region is also 
well conserved in other Smad family proteins, suggesting that it is a region essential for the Smad activity It is, there- 
fore, believed that Smad7 having this region retains the function as a signal transduction factor. 

35 

Example 5 

Using the Smad7 protein as antigen, which was prepared from the expression vector obtained by the procedures 
described above, antisera specific to Smad7 protein may be obtained by immunizing rabbits according to a known 

40 method (Harlow, E. and Lane D., 1988, "Antibodies", Cold Spring Harbor Laboratory Press, USA). The antisera so 
obtained may be further affinity-purified by using Smad7 protein as affinity ligand according to a known procedure (Har- 
low, E. and Lane D., 1988, "Antibodies", Cold Spring Harbor Laboratory Press, USA) to obtain inhibitory antibody highly 
specific to Smad7 protein. This inhibitory antibody may be added to a reaction mixture in order to assay the activity of 
Smad7. Furthermore, this inhibitory antibody may be micro-injected into living cells by a known method (Capecchi, M., 

45 1980, Cell, 22, 479-488) in order to confirm the signal transduction activity in the cells. Similarly the vector pactEF- 
mSmad7 DNA described above (Fig. 1) may also be directly injected into cells in order to assay the activity of the 
expressed product, i.e. Smad7 protein. 

Thus, the novel signal transduction factor of the present invention and the gene encoding the same are useful as a 
pharmaceutical or diagnostic agent. 
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ACGCGCACCG CGTGCCTCCT GCTGCCCGGC CGCCTGGACT GCAGGCTGGG CCCGGGGGCG 420 

CCCGCCAGCG CGCAGCCCGC GCAGCCGCCC TCGTCCTACT CGCTCCCCCT CCTGCTGTGC 480 

AAAGTGTTCA GGTGGCCGGA TCTCAGGCAT TCCTCGGAAG TCAAGAGGCT GTGTTGCTGT 54 0 

GAATCTTACG GGAAGATCAA CCCCGAGCTG GTGTGCTGCA ACCCCCATCA CCTTAGTCGA 600 

CTCTGTGAAC TAGAGTCTCC CCCTCCTCCT TACTCCAGAT ACCCAATGGA TTTTCTCAAA 660 

CCAACTGCAG GCTGTCCAGA TGCTGTACCT TCCTCCGCGG AAACCGGGGG AACGAATTAT 72 0 

CTGGCCCCTG GGGGGCTTTC AGATTCCCAA CTTCTTCTGG AGCCTGGGGA TCGGTCACAC 780 

TGGTGCGTGG TGGCATACTG GGAGGAGAAG ACTCGCGTGG GGAGGCTCTA CTGTGTCCAA 840 

GAGCCCTCCC TGGATATCTT CTATGATCTA CCTCAGGGGA ATGGCTTTTG CCTCGGACAG 900 

CTCAATTCGG ACAACAAGAG TCAGCTGGTA CAGAAAGTGC GGAGCAAGAT CGGCTGTGGC 960 

20 ATCCAGCTGA CGCGGGAAGT GGATGGCGTG TGGGTTTACA ACCGCAGCAG TTACCCCATC 1020 

TTCATCAAGT CCGCCACACT GGACAACCCG GACTCCAGGA CGCTGTTGGT GCACAAAGTG 1080 

TTCCCTGGTT TCTCCATCAA GGCTTTTGAC TATGAGAAAG CCTACAGCCT GCAGCGGCCC 114 0 

AATGACCACG AGTTCATGCA GCAACCATGG ACGGGTTTCA CCGTGCAGAT CAGCTTTGTG 1200 

AAGGGCTGGG GCCAGTGCTA CACCCGCCAG TTCATCAGCA GCTGCCCGTG CTGGCTGGAG 1260 

GTCATCTTCA ACAGCCGGTA G 1281 
(2) INFORMATION FOR SEQ ID NO: 2: 



30 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 209 base pairs 

(B) TYPE: nucleic acid 
35 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



40 



(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Mus musculus 

(B) STRAIN: Swiss-Webster/NIH 



45 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

CGGCGCCCGC GCGCGCCCCG GCCTCTGGGA GACTGGCGCA TGCCACGGAG CGCCCCTCGG 6 0 

GCCGCCGCCG CTTCTGCCCG GGCCCCTGCT GTTGCTGCTG TCGCCTGCGC CTGCTGCCCC 12 0 

50 AACTCGGCGC CCGACTTCTT CATGGTGTGC GGAGGTCATG TTCGCTCCTT AGCCGGCAAA 18 0 

CGACTTTTCT CCTCGCCTCC TCGCCCCGC 209 
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Gly Ala Gly Ala Ala Gly Gly Ala Glu Ala Asp Leu Lys Ala Leu Thr 

85 90 95 

g His Ser Val Leu Lys Lys Leu Lys Glu Arg Gin Leu Glu Leu Leu Leu 

100 105 110 

Gin Ala Val Glu Ser Arg Gly Gly Thr Arg Thr Ala Cys Leu Leu Leu 
115 120 125 

fQ Pro Gly Arg Leu Asp Cys Arg Leu Gly Pro Gly Ala Pro Ala Ser Ala 

130 135 140 

Gin Pro Ala Gin Pro Pro Ser Ser Tyr Ser Leu Pro Leu Leu Leu Cys 
145 150 155 160 

15 Lys Val Phe Arg Trp Pro Asp Leu Arg His Ser Ser Glu Val Lys Arg 

165 170 175 

Leu Cys Cys Cys Glu Ser Tyr Gly Lys lie Asn Pro Glu Leu Val Cys 
180 185 190 

20 Cys Asn Pro His His Leu Ser Arg Leu Cys Glu Leu Glu Ser Pro Pro 

195 200 205 

Pro Pro Tyr Ser Arg Tyr Pro Met Asp Phe Leu Lys Pro Thr Ala Gly 
210 215 220 

25 Cys Pro Asp Ala Val Pro Ser Ser Ala Glu Thr Gly Gly Thr Asn Tyr 

225 230 235 240 

Leu Ala Pro Gly Gly Leu Ser Asp Ser Gin Leu Leu Leu Glu Pro Gly 
245 250 255 
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Asp Arg Ser His Trp Cys Val Val Ala Tyr Trp Glu Glu Lys Thr Arg 
260 - 265 270 

Val Gly Arg Leu Tyr Cys Val Gin Glu Pro Ser Leu Asp lie Phe Tyr 

275 280 285 

Asp Leu Pro Gin Gly Asn Gly Phe Cys Leu Gly Gin Leu Asn Ser Asp 
290 295 300 

Asn Lys Ser Gin Leu Val Gin Lys Val Arg Ser Lys lie Gly Cys Gly 
305 310 315 320 

He Gin Leu Thr Arg Glu Val Asp Gly Val Trp Val Tyr Asn Arg Ser 
325 330 335 

Ser Tyr Pro He Phe He Lys Ser Ala Thr Leu Asp Asn Pro Asp Ser 
340 345 350 

Arg Thr Leu Leu Val His Lys Val Phe Pro Gly Phe Ser He Lys Ala 
355 360 365 

Phe Asp Tyr Glu Lys Ala Tyr Ser Leu Gin Arg Pro Asn Asp His Glu 
370 375 380 

Phe Met Gin Gin Pro Trp Thr Gly Phe Thr Val Gin He Ser Phe Val 
385 390 395 400 
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Lys Gly Trp Gly Gin Cys Tyr Thr Arg Gin Phe He Ser Ser Cys Pro 
405 410 415 



Cys Trp Leu Glu Val lie Phe Asn Ser Arg 
420 425 
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Claims 

1. A protein having a signal transduction activity which comprises the amino acid sequence shown in SEQ ID NO: 4, 
or a mutant of said protein retaining the signal transduction activity, which is obtained by introducing into said amino 

15 acid sequence a deletion, substitution or insertion of one or more amino acid residues. 

2. A nucleic acid encoding the amino acid sequence shown in SEQ ID NO: 4. 

3. A nucleic acid of claim 2 which comprises the DNA sequence shown in SEQ ID NO: 1 . 

20 

4. An expression vector comprising the nucleic acid of claim 2 or 3. 

5. A transformant containing the expression vector of claim 4. 

25 6. A DNA comprising the DNA sequence shown in SEQ ID NO: 2. 

7. A DNA comprising the DNA sequence shown in SEQ ID NO: 3. 

8. Pharmaceutical composition comprising the protein of claim 1 or the nucleic acid of claim 2 or 3 and optionally a 
30 pharmaceutically acceptable carrier. 

9. Diagnostic composition comprising the protein of claim 1 or the nucleic acid of claim 2 or 3. 
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Fig. 3 
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Fig. 4 
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